Llama 4 Support #1508

Qubitium · 2025-04-06T01:13:41Z

Llama 4 attention mask/casual mask caculations have issues/bugs at the moment. We are bypassing them by forcing batch=1, padding, and removing attention_masks alltogether, for now until the bugs are fixed upstream.

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Qubitium · 2025-04-06T09:51:40Z

We are missing layers/modules. The quantized sized vs non-quantized size is only 6.2% which is not normal. At 4bit, it whould be closer to 70% difference.

Qubitium · 2025-04-06T13:28:53Z

Need to fix this. MoE modules are parameters and not linear so they are skipped for quantization.

https://github.com/user-attachments/assets/ac3b3fd4-50eb-420e-929f-c6ae1f2c8b11

This reverts commit 250060b.

Qubitium and others added 7 commits April 6, 2025 01:13

update transformers for llama 4

f6170b0

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>

add Llama4GPTQ

bb89c76

use loader AutoModelForImageTextToText

0ebfb17

cleanup

006d7a6

fix qkvo forward when every 4 layer

26f074a

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Update llama4.py

15e8c04

add support_batch_quantize

10d330f

Qubitium marked this pull request as ready for review April 6, 2025 08:39

Qubitium changed the title ~~llama 4~~ Llama 4 Support Apr 6, 2025

LRL-ModelCloud and others added 2 commits April 6, 2025 16:39

add warning

4b18850

Update README.md

e1ef1f7

Qubitium mentioned this pull request Apr 6, 2025

Llama 4: eager attention results in wrong casual mask shape huggingface/transformers#37322

Closed

4 tasks

LRL-ModelCloud added 3 commits April 6, 2025 17:07

fix data

fd06d6a

Merge remote-tracking branch 'main/llama-4' into llama-4

c7c057a

fix input_ids

b650d16

Qubitium marked this pull request as draft April 6, 2025 09:50

LRL-ModelCloud added 2 commits April 6, 2025 20:35

update llama4 modules

250060b

cleanup

35266f6

LRL-ModelCloud and others added 2 commits April 6, 2025 21:34

Revert "update llama4 modules"

44fc282

This reverts commit 250060b.

Merge remote-tracking branch 'main/main' into llama-4

ed307cf

Qubitium assigned LRL2-ModelCloud Aug 21, 2025

Merge remote-tracking branch 'main/main' into llama-4

c48c054

Qubitium mentioned this pull request Aug 25, 2025

GPT-OSS supported? #1681

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama 4 Support #1508

Llama 4 Support #1508

Uh oh!

Qubitium commented Apr 6, 2025 •

edited

Loading

Uh oh!

Qubitium commented Apr 6, 2025

Uh oh!

Qubitium commented Apr 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Llama 4 Support #1508

Are you sure you want to change the base?

Llama 4 Support #1508

Uh oh!

Conversation

Qubitium commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Apr 6, 2025

Uh oh!

Qubitium commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Qubitium commented Apr 6, 2025 •

edited

Loading

Qubitium commented Apr 6, 2025 •

edited

Loading